When Is Word Sense Disambiguation Difficult? A Crowdsourcing Approach

نویسندگان

Krishna N. Kaliannan

Adam Kapelner

Krishna Kaliannan

Dean Foster

Lyle Ungar

چکیده

We identified features that drive differential accuracy in word sense disambiguation (WSD) by building regression models using 10,000 coarse-grained WSD instances which were labeled on Mturk. Features predictive of accuracy include properties of the target word (word frequency, part of speech, and number of possible senses), the example context (length), and the Turker’s engagement with our task. The resulting model gives insight into which words are difficult to disambiguate. We also show that having many Turkers label the same instance provides at least a partial substitute for more expensive annotation. Disciplines Business This working paper is available at ScholarlyCommons: http://repository.upenn.edu/wharton_research_scholars/116 When is Word Sense Disambiguation Difficult? A Crowdsourcing Approach Adam Kapelner Krishna Kaliannan The Wharton School of the University of Pennyslvania Department of Statistics 3730 Walnut Street Philadelphia, PA 19104 {kapelner, kkali, foster}@wharton.upenn.edu Dean Foster Lyle Ungar University of Pennyslvania Department of Computer Science 200 S. 33rd St 504 Levine Philadelphia, PA 19104 [email protected]

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Embracing Ambiguity: A Comparison of Annotation Methodologies for Crowdsourcing Word Sense Labels

Word sense disambiguation aims to identify which meaning of a word is present in a given usage. Gathering word sense annotations is a laborious and difficult task. Several methods have been proposed to gather sense annotations using large numbers of untrained annotators, with mixed results. We propose three new annotation methodologies for gathering word senses where untrained annotators are al...

متن کامل

A Methodology for Word Sense Disambiguation at 90% based on large-scale CrowdSourcing

Word Sense Disambiguation has been stuck for many years. In this paper we explore the use of large-scale crowdsourcing to cluster senses that are often confused by non-expert annotators. We show that we can increase performance at will: our in-domain experiment involving 45 highly polysemous nouns, verbs and adjective (9.8 senses on average), yields an average accuracy of 92.6 using a supervise...

متن کامل

Crowdsourced Word Sense Annotations and Difficult Words and Examples

Word Sense Disambiguation has been stuck for many years. The recent availability of crowdsourced data with large numbers of sense annotations per example facilitates the exploration of new perspectives. Previous work has shown that words with uniform sense distribution have lower accuracy. In this paper we show that the agreement between annotators has a stronger correlation with performance, a...

متن کامل

Colors of People (Les couleurs des gens) [in French]

In Natural Language Processing and semantic analysis in particular, color information may be important in order to properly process textual information (word sense disambiguation, and indexing). More specifically, knowing which colors are generally associated to terms is a crucial information. In this paper, we explore how crowdsourcing through a game with a purpose (GWAP) can be an adequate st...

متن کامل

Crowdsourcing Word-Color Associations

In Natural Language Processing and semantic analysis in particular, color information may be important in order to properly process textual information (word sense disambiguation, and indexing). More specifically, knowing what colors are generally associated with what terms is crucial information. In this paper, we explore how crowdsourcing through a game with a purpose (GWAP) can be an adequat...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

When Is Word Sense Disambiguation Difficult? A Crowdsourcing Approach

نویسندگان

چکیده

منابع مشابه

Embracing Ambiguity: A Comparison of Annotation Methodologies for Crowdsourcing Word Sense Labels

A Methodology for Word Sense Disambiguation at 90% based on large-scale CrowdSourcing

Crowdsourced Word Sense Annotations and Difficult Words and Examples

Colors of People (Les couleurs des gens) [in French]

Crowdsourcing Word-Color Associations

عنوان ژورنال:

اشتراک گذاری